C/C++ Users Group Library 1996 July

home *** CD-ROM | disk | FTP | other *** search

/ C/C++ Users Group Library 1996 July / C-C++ Users Group Library July 1996.iso / vol_400 / 423_01 / rdcf / dosfiles.doc < prev next >

Wrap

Text File | 1993-01-14 | 26.1 KB | 560 lines

A Description of the DOS File System by Philip J. Erdelsky PLAIN VANILLA CORPORATION CompuServe 75746,3411 InterNet 75746.3411@compuserve.com January 15, 1993 Copyright (c) 1993 Plain Vanilla Corporation 1. Introduction --------------- The DOS file system is used not only by IBM PC's and compatibles, but also by some standalone word processors that use diskettes and by a number of other devices. Even computer systems that normally use a different file system, such as UNIX-based workstations, also have the ability to read and write diskettes in DOS format. All of these uses together make the DOS file system the single most widely used file system in the world. The file system has evolved a great deal since its origin in DOS 1.0. The version described here is essentially that used by DOS 3.3. The large partitions used by DOS 4.0 and later versions of DOS are not included. This description was written in conjunction with version 2.0 of the Plain Vanilla Reentrant DOS-Compatible File System, an implementation of the basic DOS file system written in portable C and distributed as free software. There are a number of details regarding the DOS file system that are undocumented. Some of them have been filled in by experimentation; others are simply described as undefined; 2. Block Devices ---------------- A DOS file system resides on what is generally called a block device. This is usually a disk of some kind, but it may also be a part of regular or extended memory that has been configured as a RAM disk. A block device stores information in sectors of equal size. The most common sector size is 512 bytes, which is used by all standard DOS floppy diskettes, but some RAM disks and hard disks may use sectors containing 128, 256 or 1024 bytes. The sectors are numbered in a very simple way. Each sector has a logical sector number. The smallest logical sector number is zero, and the largest logical sector number is one less than the number of sectors. Every logical sector number between these limits is assigned to one other sector. There are no gaps in the numbering system. 3. Mapping Sides and Tracks --------------------------- The logical sector numbering system usually hides some complexities that must be handled by the disk controller hardware, the disk driver software or both. The sectors on a disk are arranged into concentric circles called tracks. Most floppy disks have tracks on both sides. A hard disk has more than two sides, because it consists of two or more disks, rotating in unison on the same spindle. Some disk controllers, especially hard disk controllers, accept logical sector numbers and make all conversions internally; but this is not always the case. To read or write a sector, the disk controller first moves a reading and writing device, called a head, back or forth until it is positioned over the track containing the desired sector. If the disk has tracks on more than one side, there is head for each side of the disk, and they move together. The disk controller selects the proper head by electronic switching. Finally, the disk controller waits until the disk rotation brings the desired sector under the head. Then it reads or writes the sector. To convert a logical sector number L into the corresponding side, track and sector numbers, the controller (or the software driver) usually needs the following three numbers: N = the number of sectors per track, T = the number of tracks per side, and S = the number of sides. The tracks are usually numbered from 0 to T-1, with track 0 on the outside and track T-1 on the inside. This puts the most frequently accessed sectors (those with low logical sector numbers) on the outside of the disk, where sectors are largest and the disk is the most reliable. The sides are usually numbered from 0 to S-1, in an order determined by the hardware design. The sectors in a single track are numbered from 1 to N, where the disk rotation carries them under a head in order of increasing sector number. The numbering starts with 1, not 0. The reason for this odd convention is not clear, but it is followed by a number of floppy disk drives, including the ones usually used with DOS-based computers. The assignment of logical sector numbers to sectors on a disk follows a simple scheme with a straightforward object -- to minimize access time when the sectors are read or written in order of ascending logical sector numbers. This usually means minimizing the number of times the heads must be moved, since this is by far the slowest kind of disk operation. The formulas are as follows, where A rem B means the remainder when A is divided by B: sector number = L rem N + 1 side number = (L/N) rem S track number = (L/N) / S Logical sector number 0 is sector 1 in track 0 on side 0. Since this sector can always be located without knowing the disk structure, it is often used to store information about the disk structure. This is important for floppy disks, which are removable. Many floppy disk drives can accommodate disks with more than one format. The file system codecan read logical sector 0 to find out how to locate other sectors. As the logical sector number increases, the heads must be moved only when the track number increases. This does not happen until all sectors with a particular track number have been accessed. The number of tracks is needed only to check the track number to see whether it is in range. This is advisable in some cases because some disk drives may be damaged if an attempt is made to move the heads beyond their normal range. 4. Handling Bad Sectors ----------------------- On any large hard disk, there are apt to be some bad spots. Some hard disk controllers can remap the bad spots and make the disk appear to be a perfect disk with a slightly smaller capacity. This is usually done during low-level formatting. However, floppy disk controllers and some hard disk controllers cannot do this. To handle most cases of this kind, the DOS file system can be set up so that some sectors are marked as bad. No DOS file information will be written into bad sectors. This lowers the disk capacity, but it makes imperfect disks usable. This is usually done when the disk is formatted. If there is a bad sector in a critical area of the disk, such as the part that indicates which sectors are bad, the disk cannot be used to hold a DOS file system. Fortunately, the critical area of a disk is quite small, so this is quite unlikely. 5. Disk Partitions ------------------ The original DOS file system used 16-bit logical sector numbers, so it could handle only file systems with at most 65535 sectors, or 32 megabytes when 512-byte sectors were used. This was the infamous 32-megabyte barrier. DOS could use a disk with more than 65535 sectors, but it could use only 65535 sectors for its file system. A simple solution to this problem is to divide the disk into two or more partitions, each of which contains no more than 65535 sectors, and to treat each partition as though it were a separate disk. DOS 4.00 and later versions of DOS can use 32-bit logical sector numbers, so they can handle larger partitions. This would seem to imply an 8-terabyte maximum, but other limitations make the practical maximum smaller than that. This document covers only partitions of 32 megabytes or less that use 16-bit logical sector numbers. 6. The Layout of a DOS Disk --------------------------- The sectors of a DOS disk are divided into the following groups, which are listed in order of increasing logical sector numbers: RESERVED SECTORS FILE ALLOCATION TABLES ROOT DIRECTORY DATA AREA HIDDEN SECTORS The groups are described in detail in the following sections. 7. Reserved Sectors ------------------- The first reserved sector, which is always logical sector 0, is also called the bootstrap sector. Here is what is in a 512-byte bootstrap sector: byte(s) contents ------- ------------------------------------------------------- 0-2 first instruction of bootstrap routine 3-10 OEM name 11-12 number of bytes per sector 13 number of sectors per cluster 14-15 number of reserved sectors 16 number of copies of the file allocation table 17-18 number of entries in root directory 19-20 total number of sectors 21 media descriptor byte 22-23 number of sectors in each copy of file allocation table 24-25 number of sectors per track 26-27 number of sides 28-29 number of hidden sectors 30-509 bootstrap routine and partition information 510 hexadecimal 55 511 hexadecimal AA Two-byte fields are always arranged in "little endian" order, with the less significant byte first. Notice that some two-byte fields are not aligned on even addresses. This may require special treatment by CPU's such as the Motorola 68000, which cannot directly access words at odd addresses. The first instruction of the bootstrap routine is always a jump instruction that transfers control to the bootstrap routine in bytes 30-509. The OEM name is usually the version of DOS or the name of the utility that was used to format the disk, represented by eight ASCII characters. It may even be completely blank. DOS apparently makes no use of the OEM name. The total number of reserved sectors, including the bootstrap sector, is specified by the contents of bytes 14 and 15. Reserved sectors other than the bootstrap sector are not used by DOS. Most DOS disks have only one reserved sector, but a larger number may be used to reserve space for a non-DOS partition or to cause DOS to avoid one or more bad sectors. For most 5-1/4 inch and 3-1/2 inch diskettes, the media descriptor byte conveys some the of same information redundantly: size 5-1/4 5-1/4 5-1/4 5-1/4 5-1/4 3-1/2 3-1/2 density (D = double, H = high) D D D D H D H sides 1 1 2 2 2 2 2 media descriptor byte FE FC FF FD F9 F9 F0 bytes per sector 512 512 512 512 512 512 512 sectors per cluster 1 1 2 2 1 2 1 reserved sectors 1 1 1 1 1 1 1 copies of file allocation table 2 2 2 2 2 2 2 entries in root directory 64 64 112 112 224 112 224 total sectors 320 360 640 720 2400 1440 2880 sectors per file allocation table 1 2 1 2 7 3 9 sectors per track 8 9 8 9 15 9 18 hidden sectors 0 0 0 0 0 0 0 In addition, the media descriptor byte is repeated in the file allocation table, which is described below. On diskettes formatted with only 8 sectors per track, the information given in bytes 3-29 is absent, and DOS must examine the media descriptor byte in the file allocation table to determine the exact disk format. What DOS will do with a diskette that contains nonstandard but consistent parameters in its bootstrap sector is undefined. In some cases it will read and write the diskette properly; in other cases it will not. 8. File Allocation Tables ------------------------- The sectors immediately after the last reserved sector hold one or more copies of the file allocation table (sometimes called the FAT). The number of copies is specified by byte 16 in the bootstrap sector, and the number of sectors in each copy is specified by bytes 22 and 23 in the bootstrap sector. DOS uses only the first copy of the file allocation table, but it updates other copies, if any, to keep them identical to the first copy. Then if the first copy develops a bad sector, a special file recovery program can be used to retrieve the information from another copy. Most floppy and hard disks contain two copies of the file allocation table. Most RAM disks contain only one because RAM is a much more reliable medium. Versions of DOS before 3.00 used a file allocation table with 12-bit entries. Each group of three consecutive bytes contains two 12-bit entries, arranged as follows: The first byte contains the eight least significant bits of the first entry. The four least significant bits of the second byte contain the four most significant bits of the first entry. The four most significant bits of the second byte contain the four least significant bits of the second entry. The third byte contains the eight most significant bits of the second entry. In other words, if UV, WX and YZ are the hexadecimal representations of the three consecutive bytes, then the entries are WUV and YZX, respectively. This odd arrangement is actually quite easy to implement in 8086 assembly language. Yes, a file allocation table entry can be split between sectors. A file allocation table with 12-bit entries is suitable for floppy diskettes, but proved inadequate for hard disk partitions, even those within the 32-megabyte limit. With DOS 3.0, an alternative format with 16-bit entries was introduced. A 16-bit entry occupies two consecutive bytes, stored in "little endian" order, with the less significant byte first. The file allocation table entries are numbered consecutively. Entry 0 (the first entry) contains the media descriptor byte (byte 21 in the bootstrap sector), padded at the more significant end with ones to make it a 12-bit or 16-bit value. Entry 1 contains either 12 or 16 ones. For file allocation purposes, the data area of the disk is divided into a number of clusters. Each cluster is the same size, and consists of the number of consecutive sectors specified in byte 13 of the bootstrap sector. The clusters are numbered consecutively, starting with cluster number 2, which lies at the beginning of the data area. Each entry in the file allocation table, except entries 0 and 1, is associated with the cluster with the same number as the entry. This is quite consistent, because there are no clusters with number 0 or 1. The contents of each such entry are interpreted as follows: contents in hexadecimal 12-bit 16-bit meaning ------- --------- --------------------------------------------------- 000 0000 cluster is unassigned and available 001 0001 invalid entry 002-FEF 0002-FFEF cluster is assigned, contents is the number of the next cluster in the same file or subdirectory FF0-FF6 FFF0-FFF6 reserved FF7 FFF7 cluster contains a bad sector FF8-FFF FFF8-FFFF cluster is the last cluster assigned to a file or subdirectory Whether 12-bit or 16-bit entries are used depends on the number of clusters in the data area. If the range from 002 to FEF is sufficient to number them all, 12-bit entries are used. Otherwise, 16-bit entries are used. Notice that the restrictions on cluster size and the number of clusters put an upper limit on the partition size. Each subdirectory, and each file that contains at least one byte of data, is associated with a chain of file allocation table entries, one for each cluster. The directory entry for the file or subdirectory, which is described below, contains the number of the first cluster. The entry for each cluster other than the last cluster contains the number of the next following cluster. The entry for the last cluster contains a value in the range FF8-FFF or FFF8-FFFF, which marks the end of the chain. 9. Root Directory ----------------- The root directory contains one 32-byte entry for each file in the root directory of the disk. If the disk has a volume label, the root directory also contains a 32-byte entry for that. The maximum number of root directory entries is specified in bytes 17 and 18 of the bootstrap block. This number, together with the sector size, determines the number of sectors allocated to the root directory. The field in bytes 17 and 18 of the bootstrap sector are always chosen so the number of root directory sectors is a whole number. What DOS will do otherwise is undefined. 10. The Data Area and Hidden Sectors ------------------------------------ The total number of sectors given in bytes 19 and 20 of the bootstrap sector includes all reserved sectors and all sectors in the root directory, the data area and all copies of the file allocation table. It does not include the hidden sectors. The size of the data area is not given directly, but it can be calculated from this information. Apparently it must be an exact multiple of the cluster size. What DOS will do otherwise is undefined. The total number of sectors, plus the number hidden sectors given in bytes 28-29 of the bootstrap sectors, is the true total number of sectors in the partition. This is equal to the product of the number of sectors per track, the number of tracks per side, and the number of sides. The number of tracks per side is not given, but it can be easily calculated from this information. Usually, the hidden sectors are simply a few leftover sectors that could not be included in the data area because of various format constraints, but a large number of hidden sectors could be used to conceal a second partition. 11. Directory Structure ----------------------- Each directory entry, whether in the root directory or in a subdirectory, contains the following fields: byte(s) contents ------- ---------------------------------------------------- 0-7 file name or first 8 characters of volume name 8-10 file extension or last 3 characters of volume name 11 attribute byte 12-21 unused 22-23 time 24-25 date 26-27 number of first cluster 28-31 number of bytes in file, or zero for subdirectory or volume label The bits in the attribute byte determine the type of entry as follows (bit 7 is the most significant bit): bit meaning if bit = 1 --- --------------------------------------- 7 unused 6 unused 5 file has been changed since last backup 4 entry represents a subdirectory 3 entry represents a volume label 2 system file 1 hidden file 0 read-only If the entry represents a subdirectory, bit 4 is set and all other bits are unused. If the entry represents a volume label, bit 3 is set and other bits are unused. Since DOS file names are case-insensitive, the file name and extension contain no lower-case (small) letters. They are always converted to the corresponding capital letters before writing them to the disk. What DOS will do if it finds small letters in a file name or extension in a directory entry is undefined. The file name and extension are padded with ASCII blanks (hexadecimal 20) at the end if necessary to fill their respective fields. Subdirectory names and extensions follow the same rules. A volume name, however, is case-sensitive, and small letters are permitted. The name is always 11 characters long, and is apparently also padded with blanks to fill out its field. When a subdirectory is created, or when a file or volume label is created or modified, DOS writes the current time and date into the time and date fields. The time field is a two-byte value, stored in "little endian" order, with the less significant byte first. Here is its format (bit 15 is the most significant bit): bits contents ----- --------------------- 15-11 hour (0-23) 10-5 minute (0-59) 4-0 double seconds (0-29) Since there are not enough bits in the time field to express the time to the nearest second, the time is first rounded or truncated to the nearest even second. The resulting seconds field is divided by 2 and written into bits 4-0. (Whether DOS actually uses rounding or truncation undefined, but it scarcely matters.) The date field is a two-byte value, stored in "little endian" order, with the less significant byte first. Here is its format (bit 15 is the most significant bit): bits contents ----- ----------------------------------------------- 15-9 years elapsed since 1980 (0-127) 8-5 month (1=January, 2=February, ..., 12=December) 4-0 day (1-31) Hence the DOS world began on January 1, 1980 at 00:00:00 and will end on December 31, 2107 at 23:59:58. The very first byte of a directory entry has another function. If it is hexadecimal E5, the entry represents a deleted file, subdirectory or volume label. All other parts of the entry are valid and represent the status of the entry just before it was deleted. If the first byte is zero (not an ASCII zero, which has the value hexdecimal 30, but an ASCII NUL), the entry is a terminating entry. It has never been used since the directory was created, and there are no entries after it in the same directory. When DOS 3.0 came along, characters from the IBM extended character set were permitted in file names and extensions. Unfortunately, this created an ambiguity for hexadecimal E5, which represents an IBM extended character and also represents a deleted entry. For compatiblity between DOS 3.0 and disks formatted under previous versions of DOS, DOS 3.0 converts the extended character represented by hexadecimal E5 into 05 before writing it to the directory entry when it appears as the first character of a file name. Presumably, DOS gives the same treatment to the first character of a volume label. If a file contains no data, bytes 28-31 (number of bytes in file) are zero, which is quite logical. However, bytes 26 and 27 (number of first cluster) are also zero, although FFF8 would probably have been more logical. Bytes 28-31 (number of bytes in file) for a subdirectory entry are always zero. The length of a subdirectory is not recorded in its directory entry. However, a subdirectory always contains at least one cluster. Its length can be determined only by examining the chain of file allocation table entries. The first two directory entries in a subdirectory are always the special subdirectories "." and "..", respectively. Their time and date fields are the same as those of their parent (the subdirectory in which they lie). The first cluster number of the subdirectory "." is also the same as that of its parent. The first cluster number of the subdirectory ".." is the same as that of the parent of its parent, or zero if the parent or its parent is the root directory. Unused fields in a directory entry are set to zeros. What DOS will do otherwise is undefined. Also undefined is what DOS will do if it finds a volume label in a subdirectory, or if if fails to find the special subdirectories "." and "..". 12. Directory Protocol ---------------------- When a disk is formatted, every entry in the root directory is a terminating entry. When a subdirectory is created, it is immediately given one cluster. (If there are no free clusters, a subdirectory cannot be created and the operation fails.) The first two entries are filled with appropriate information for the special subdirectories "." and "..". All other entries become terminating entries. There cannot be two entries in the same directory with same name and extension unless they are deleted entries or one is a volume label and the other is not a volume label. If the creation of a new file or subdirectory would cause a conflict, DOS usually fails the operation. However, if a new file is created with the same name and extension as an existing file in the same directory, and the file mode permits, the new entry will simply replace the old one. A new volume label in the root directory simply replaces the old one, if any. When an entry is deleted, its first byte is replaced by hexedecimal E5 to mark it as deleted. The rest of the entry is left unchanged. If the entry represented a file or subdirectory, all clusters associated with it are marked as empty in the file allocation table. The data in the clusters is not erased. Other entries in the directory are not moved to fill the gap. In fact, so much information is left intact that a special utility may be able to restore almost all of the deleted file, subdirectory or volume label if it is invoked immediately, before anything else is written to the disk. The first character of the name is truly lost, but other information remains on the disk and can be restored if it can be accurately located. When a new entry is added to a directory, it is put in the first available location. If the directory contains any deleted entries, the new entry replaces the first deleted entry. Otherwise, the new entry replaces the first terminating entry. If the directory contains no deleted or terminating entry, what happens next depends on whether it is the root directory or a subdirectory. If it is the root directory, the operation fails because the root directory cannot be lengthened. Otherwise, the directory is lengthened by adding another cluster to its allocation chain in the the file allocation table. If there are no free clusters, the operation fails. Otherwise, the new entry is the first entry in the new cluster, and all other entries in the cluster become terminating entries. After a number of insertions and deletions, a directory can become quite fragmented. The directory search time can be increased substantially if there are a lot of deleted entries, especially near the beginning of the directory. Special utilities are available to clean up fragmented directories, although they may make it difficult or even impossible to restore deleted files or subdirectories.